For these analyses, I begun exploring grade level offerings and school enrollment trends over time. I hypothesized that trends will stay relatively stable, and overall, that is what we see in these data. However, in future analysis, we may decide to pull out some of these outlying schools to examine if there are any unusual trends that we see for them in terms of practices selected, catalysts, etc. For modeling purposes, we can also identify if/any outliers are particularly influential.
Demographic analysis to follow, but happy to take suggestions and guidance.
I first selected grade level offerings by school by year to conduct a within-schools analysis. Most schools only completed this portion of the survey one or fewer times (grade level count for these is 4,256). These left much fewer schools/grades to examine trends.
dat <- long_dat %>%
select(school_id, year, starts_with("grades")) %>%
select(-c(4:6, 21:22)) #not select elem, mid, high, & other columns
grades_long <- dat %>%
pivot_longer(
cols = starts_with("grades_"),
names_to = "grade_level",
values_to = "offered"
)
grades_wide <- grades_long %>%
pivot_wider(names_from = "year",
values_from = "offered") %>%
mutate(yes_report_count = rowSums(select(., `2019`:`2024`), na.rm = TRUE),
years_reported = rowSums(!is.na(select(., `2019`:`2024`))))
grades_change <- grades_wide %>%
mutate(change_status = case_when(years_reported == 0 ~ "No Data",
yes_report_count == 0 & years_reported > 0 ~ "Never Offered",
yes_report_count == years_reported ~ "Always Offered"))
library(purrr)
changes_reported <- grades_wide %>%
rowwise() %>%
mutate(change_status = {
# Extract relevant columns and remove NAs
values <- na.omit(c_across(3:7))
# Check the patterns of change
if (length(values) > 1) { # Ensure there is more than one value to compare changes
added <- any(diff(values == 1) == 1) # Checks for a change from 0 to 1
dropped <- any(diff(values == 1) == -1) # Checks for a change from 1 to 0
if (added && dropped) {
"Waffled"
} else if (added) {
"Added"
} else if (dropped) {
"Dropped"
} else {
"Stable" # No change in status
}
} else {
"One or fewer data points" # Not enough data to determine a change
}
}) %>%
ungroup()
changes_reported_by_grade <- changes_reported %>%
group_by(grade_level, change_status) %>%
summarise(count = n()) %>%
mutate(grade_level = sub("grades_", "", grade_level))Specifically, these are the values:
##
## Added Dropped One or fewer data points
## 148 27 4535
## Stable Waffled
## 3299 1
Below, these values are also visualized.
changes_reported_by_grade$grade_level <- factor(changes_reported_by_grade$grade_level, levels = c("pk","k", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "past_12"))
changes_reported_by_grade %>%
ggplot() +
geom_col(aes(x = grade_level, y = count, fill = change_status), position = "dodge") +
coord_flip() +
labs(x = "Grade Level",
y = "Count",
legend = "Change Status",
title = "Change Status Within Schools by Grade Level") +
scale_fill_manual(values = transcend_cols2)4/30 Modification: Looks like you structured these by each point given to a grade (correct me if that’s wrong), but curious about remaking the graph tallying schools rather than observation–i.e., show us how many schools expanded or limited grade offerings and which grades were expanded/offered in those cases
# Introduce counts at the school level
changes_reported_by_school <- changes_reported %>%
group_by(school_id, grade_level, change_status) %>%
summarise(count = n(), .groups = "drop") %>%
distinct(school_id, grade_level, change_status) %>%
count(grade_level, change_status) %>%
mutate(grade_level = sub("grades_", "", grade_level))
# Update grade levels to be factors with the specific order
changes_reported_by_school$grade_level <- factor(
changes_reported_by_school$grade_level,
levels = c("pk","k", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "past_12")
)
# Visualization: Plotting the number of schools by grade level and change status
ggplot(changes_reported_by_school, aes(x = grade_level, y = n, fill = change_status)) +
geom_col(position = "dodge", width = .7) +
coord_flip() +
labs(
x = "Grade Level",
y = "Number of Schools",
title = "Changes in Grade Offerings by Schools",
subtitle = "Count of schools by change status in grade offerings",
fill = "Change Status"
) +
scale_fill_manual(values = transcend_cols2) +
theme(legend.position = "top",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8)) +
guides(fill = guide_legend(ncol = 5))We can see that the majority of schools (300+) had only one or fewer data points. I’ll remove them for the viz below.
changes_reported_by_school %>%
filter(change_status != "One or fewer data points") %>%
ggplot(aes(x = grade_level, y = n, fill = change_status)) +
geom_col(position = "dodge") +
coord_flip() +
labs(
x = "Grade Level",
y = "Number of Schools",
title = "Changes in Grade Offerings by Schools",
subtitle = "Count of schools by change status in grade offerings",
fill = "Change Status"
) +
scale_fill_manual(values = transcend_cols) +
theme(legend.position = "right")changes_reported_by_school %>%
filter(change_status != "One or fewer data points") %>%
filter(change_status != "Stable") %>%
ggplot(aes(x = grade_level, y = n, fill = change_status)) +
geom_col(position = "dodge", width = .7) +
coord_flip() +
labs(
x = "Grade Level",
y = "Number of Schools",
title = "Changes in Grade Offerings by Schools",
subtitle = "Count of schools by change status in grade offerings",
fill = "Change Status"
) +
scale_fill_manual(values = transcend_cols) +
theme(legend.position = "top",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8)) +
guides(fill = guide_legend(ncol = 3))Similarly, school enrollment within schools have seemed to remain mostly stagnant. The total schools in the survey, followed by schools with 3+ and 5 years of data, are visualized below.
dat <- long_dat %>%
select(school_id, year, school_enrollment) %>%
na.omit()
enrollment_wide <- dat %>%
pivot_wider(
names_from = "year",
values_from = "school_enrollment"
)
enrollment_dat <- dat %>%
group_by(school_id) %>%
filter(n() >1) %>%
ungroup()
enrollment_dat %>%
ggplot(aes(x = year, y = school_enrollment, group = school_id)) +
geom_point(color = transcend_cols[1], alpha = 0.6) +
geom_line(color = transcend_cols[1], alpha = 0.4) +
labs(title = "School Enrollment by Wave",
x = "Wave Number",
y = "School Enrollment") Note: 239 schools have at least two years of enrollment data. Let’s look at schools with more years of data
enrollment_dat %>%
group_by(school_id) %>%
filter(n()>2) %>%
ungroup() %>%
ggplot(aes(x = year, y = school_enrollment, group = school_id)) +
geom_point(color = transcend_cols[1], alpha = 0.6) +
geom_line(color = transcend_cols[1], alpha = 0.4) +
labs(title = "School Enrollment by Wave with 3+ years of data",
x = "Wave Number",
y = "School Enrollment") enrollment_dat %>%
group_by(school_id) %>%
filter(n()>4) %>%
ungroup() %>%
ggplot(aes(x = year, y = school_enrollment, group = school_id)) +
geom_point(color = transcend_cols[1], alpha = 0.6) +
geom_line(color = transcend_cols[1], alpha = 0.4) +
labs(title = "School Enrollment by Wave with 5 years of data",
x = "Wave Number",
y = "School Enrollment") Notably, school_id 234 has reported jumping up to 10,000 students in 2023 from only 298 the year before. This is the Virtual Learning School in Exeter, NH.
4/30 modification: I like these plots–wondering if we can add an annotation in the plot for that outlier to name the school. && The graphs are a little hard to see any trends because one of the schools is such an outlier for enrollment growth. Could you a) omit the outlier to see if the graphs yield more interesting takeaways, and b) explore a way to characterize enrollment change over time in a different way - e.g., on average schools’ enrollment grew or shrank by X?
enrollment_viz <- enrollment_dat %>%
group_by(school_id) %>%
filter(n()>3) %>%
ungroup() %>%
ggplot(aes(x = year, y = school_enrollment, group = school_id)) +
geom_point(color = transcend_cols[1], alpha = 0.6) +
geom_line(color = transcend_cols[1], alpha = 0.4) +
annotate(
"text", label = "Note: The Virtual Learning Academy in Exeter, NH \nhas reported jumping up to 10,000 students \nin 2023 from 298 enrolled the year before.",
x = 2021, y = 7350, size = 4.5, colour = "red"
) +
labs(title = "School Enrollment by Wave with 4+ years of data",
x = "Wave Number",
y = "School Enrollment")
enrollment_vizenrollment_sans_out_viz <- enrollment_dat %>%
group_by(school_id) %>%
filter(n()>3) %>%
filter(school_id != 234) %>%
ungroup() %>%
ggplot(aes(x = year, y = school_enrollment, group = school_id)) +
geom_point(color = transcend_cols[1], alpha = 0.6) +
geom_line(color = transcend_cols[1], alpha = 0.4) +
labs(title = "School Enrollment by Wave with 4+ years of data",
subtitle = "School ID 234 (outlier Virtual Learning Academy) was removed.",
x = "Wave Number",
y = "School Enrollment")
enrollment_sans_out_vizdat %>%
group_by(school_id) %>%
filter(n()>3) %>%
ungroup() %>%
ggplot(aes(x = year, y = pct_ell)) +
geom_smooth(alpha = 0.3, color = "darkgray") +
geom_point(aes(group = school_id), color = transcend_cols2[1], alpha = 0.6) +
geom_line(aes(group = school_id), color = transcend_cols2[1], alpha = 0.4) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
labs(title = "Pct. English Language Learner \nClassification by Wave",
subtitle = "Restricted to Schools with 4+ years of data",
x = "Wave Number",
y = "Pct. ELL") dat %>%
group_by(school_id) %>%
filter(n()>3) %>%
ungroup() %>%
ggplot(aes(x = year, y = pct_swd)) +
geom_smooth(alpha = 0.3, color = "darkgray") +
geom_point(aes(group = school_id), color = transcend_cols[2], alpha = 0.6) +
geom_line(aes(group = school_id), color = transcend_cols[2], alpha = 0.4) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
labs(title = "Pct. Students with Disabilities \nClassification by Wave",
subtitle = "Restricted to Schools with 4+ years of data",
x = "Wave Number",
y = "Pct. SWD") Note: We know that the measurement of FRPL is an increasingly unreliable metric, with more schools automatically enrolled and fewer requiring individual enrollment and student-level self-disclosure of FRPL status. While I include the chart below, I do not interpret these analyses due to high measurement error.
Source: Click here for more information.
dat %>%
group_by(school_id) %>%
filter(n()>3) %>%
ungroup() %>%
ggplot(aes(x = year, y = pct_frpl)) +
geom_smooth(alpha = 0.3, color = "darkgray") +
geom_point(aes(group = school_id), color = transcend_cols[1], alpha = 0.6) +
geom_line(aes(group = school_id), color = transcend_cols[1], alpha = 0.4) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
labs(title = "Pct. Free or Reduced-Price Lunch \nClassification by Wave",
subtitle = "Restricted to Schools with 4+ years of data",
x = "Wave Number",
y = "Pct. FRPL") dat %>%
group_by(school_id) %>%
filter(n()>3) %>%
ungroup() %>%
ggplot(aes(x = year, y = pct_bipoc)) +
geom_smooth(alpha = 0.3, color = "darkgray") +
geom_point(aes(group = school_id), color = transcend_cols[3], alpha = 0.6) +
geom_line(aes(group = school_id), color = transcend_cols[3], alpha = 0.4) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
labs(title = "Pct. Black, Indigenous, and People of Color \nClassification by Wave",
subtitle = "Restricted to Schools with 4+ years of data",
x = "Wave Number",
y = "Pct. BIPOC") Looks like school ID 105 is doing this weird jump where it goes from less than 1% BIPOC to 100% BIPOC. This is Juab High School in Nephi, Utah. Upon looking at the race breakdown, this looks like a coding error. I will just omit from the analysis for now.
dat %>%
group_by(school_id) %>%
filter(n()>3) %>%
ungroup() %>%
filter(school_id != 105) %>%
ggplot(aes(x = year, y = pct_bipoc)) +
geom_smooth(alpha = 0.3, color = "darkgray") +
geom_point(aes(group = school_id), color = transcend_cols[3], alpha = 0.6) +
geom_line(aes(group = school_id), color = transcend_cols[3], alpha = 0.4) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
labs(title = "Pct. Black, Indigenous, and People of Color \nClassification by Wave",
subtitle = "Restricted to Schools with 4+ years of data",
x = "Wave Number",
y = "Pct. BIPOC")